chore: promote staging to staging-promote/ec04354c-23271447493 (2026-03-19 04:37 UTC) by ironclaw-ci[bot] · Pull Request #1396 · nearai/ironclaw

ironclaw-ci · 2026-03-19T04:37:47Z

Auto-promotion from staging CI

Batch range: 428303af1128e7f124ad623fc1338393a4d06fcc..3dcccc1e64ea92fef2a44cf413b7cf974821da96
Promotion branch: staging-promote/3dcccc1e-23280048384
Base: staging-promote/ec04354c-23271447493
Triggered by: Staging CI batch at 2026-03-19 04:37 UTC

Commits in this batch (22):

fe53f69 chore: promote staging to staging-promote/57c397bd-23120362128 (2026-03-16 05:35 UTC) (chore: promote staging to staging-promote/57c397bd-23120362128 (2026-03-16 05:35 UTC) #1236)
1ad1335 chore: release v0.19.0 (chore: release v0.19.0 #973)
ef5715c fix: mark ironclaw_safety unpublished in release-plz (fix: align release-plz publish config for ironclaw_safety #1286)
7a4673c chore: update WASM artifact SHA256 checksums [skip ci] (chore: update WASM artifact checksums and version-pinned URLs #1297)
e9b0823 fix(setup): remove nonexistent webhook secret command hint (fix(setup): remove nonexistent webhook secret command hint #1349)
bedc71e fix(llm): cap retry-after delays (fix(llm): cap retry-after delays #1351)
33a2dd2 fix(telegram): preserve polling after secret-blocked updates (fix(telegram): preserve polling after secret-blocked updates #1353)
0be5910 fix(mcp): retry after missing session id errors (fix(mcp): retry after missing session id errors #1355)
9286978 chore(ci): add coverage gates via codecov.yml (Add coverage gates via codecov.yml configuration #1228) (chore(ci): add coverage gates via codecov.yml (#1228) #1291)
2d0b195 feat: upgrade MiniMax default model to M2.7 (feat: upgrade MiniMax default model to M2.7 #1357)
07e6e30 fix: add debug_assert invariant guards to critical code paths (fix: add debug_assert invariant guards to critical code paths #1312)
f2cd1d3 docs: add Japanese README (docs: add Japanese README #1306)
2020270 Fix duplicate LLM responses for matched event routines (Fix duplicate LLM responses for matched event routines #1275)
42ffefa fix: remove -x from coverage pytest to prevent suite-blocking failures (fix: e2e coverage suite reliability #1360)
6831bb4 fix: full_job routine concurrency tracks linked job lifetime (fix: full_job routine concurrency tracks linked job lifetime #1372)
14abd60 fix: full_job routine runs stay running until linked job completion (fix: full_job routine runs stay running until linked job completion #1374)
ec04354 fix: address valid review comments from PR chore: promote staging to main (2026-03-18 16:22 UTC) #1359 (fix: address valid review comments from PR #1359 #1380)
4566181 feat(gateway): unified settings page with subtabs (feat(gateway): unified settings page with subtabs #1191)
b7a1edf fix: remove debug_assert guards that panic on valid error paths (fix: remove debug_assert guards that panic on valid error paths #1385)
8b15f8b feat(telegram): support auto split large message (feat(telegram): support auto split large message #1084)
c8ee55e feat(testing): add FaultInjector framework for StubLlm (feat(testing): add FaultInjector framework for StubLlm #1233)
3dcccc1 feat(self-repair): wire stuck_threshold, store, and builder (feat(self-repair): wire stuck_threshold, store, and builder #712)

Current commits in this promotion (5)

Current base: staging-promote/ec04354c-23271447493
Current head: staging-promote/3dcccc1e-23280048384
Current range: origin/staging-promote/ec04354c-23271447493..origin/staging-promote/3dcccc1e-23280048384

8b15f8b feat(telegram): support auto split large message (feat(telegram): support auto split large message #1084)
c8ee55e feat(testing): add FaultInjector framework for StubLlm (feat(testing): add FaultInjector framework for StubLlm #1233)
3dcccc1 feat(self-repair): wire stuck_threshold, store, and builder (feat(self-repair): wire stuck_threshold, store, and builder #712)
b9e5acf fix: add missing builder field and update E2E extensions tab navigation (fix: resolve CI failures from missing builder field and stale E2E selector #1400)
07c6ca7 fix: navigate telegram E2E tests to channels subtab (fix: navigate telegram E2E tests to channels subtab #1408)

Auto-updated by staging promotion metadata workflow

Waiting for gates:

Tests: pending
E2E: pending
Claude Code review: pending (will post comments on this PR)

Auto-created by staging-ci workflow

* feat(telegram): support auto split large message * fix(telegram): strengthen split_message test assertion Replace word-by-word contains check with assert_eq! on rejoined chunks, ensuring split_message preserves content exactly. send_response is still used (lines 745, 753) so it is intentionally kept. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(telegram): add missing split_message tests and document limitations - Add test for sentence-boundary splitting - Add test for hard-cut on pathological input (no spaces) - Add test for multi-byte character safety (emoji) - Document CJK sentence punctuation limitation - Document trim behavior at chunk boundaries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: re-trigger CI with latest changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Hans <me@hans00.me> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(testing): add FaultInjector framework for StubLlm (#1220) Adds a configurable fault injection framework for testing retry, failover, and circuit breaker behavior. The FaultInjector attaches to StubLlm and provides per-call control over failure type, timing, and sequencing. Components: - FaultType: maps to LlmError variants (RequestFailed, RateLimited, AuthFailed, InvalidResponse, IoError, ContextLengthExceeded, SessionExpired) - FaultAction: Succeed, Fail(FaultType), Delay(Duration) - FaultMode: SequenceOnce (play then succeed), SequenceLoop (repeat forever), Random (seeded xorshift64 PRNG for reproducibility) - FaultInjector: thread-safe (AtomicU32 counter + Mutex RNG) Integration: - StubLlm gains optional fault_injector field via with_fault_injector() - When set, takes precedence over should_fail/error_kind - Backward compatible: existing StubLlm usage unchanged Closes #1220 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(testing): address review feedback on FaultInjector - Remove redundant .abs() in random fault comparison - Extract check_faults() helper to DRY up StubLlm methods - Guard xorshift seed=0 (fixed point) by mapping to 1 - Add StubLlm integration test (stub_llm_fault_injector_sequence) - Remove dead seed field from FaultMode::Random - Move pub mod fault_injection to top of mod.rs - Add Debug impl for FaultInjector - Add empty_sequence_always_succeeds test - Add random_seed_zero_does_not_always_fail test * fix(testing): address #1233 review -- seed-0 bug, reset(), Debug derive - Store seed in FaultMode::Random so reset() can re-init the RNG - Add reset() method for test reproducibility (re-seeds RNG, zeros counter) - Strengthen seed=0 regression test to 100 iterations with stricter assertion - Add reset_restores_random_rng_from_stored_seed test - Debug impl and empty_sequence test were already present from prior commit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: re-trigger CI with latest changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: trigger new run with skip-regression-check label Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(testing): address PR #1233 review -- error_rate validation and edge cases - Validate error_rate is in 0.0..=1.0 and not NaN (panics on invalid input) - Fix error_rate==1.0 edge case: use <= instead of < so 1.0 always fails - Add regression tests for error_rate validation (NaN, negative, >1.0) - Add tests for error_rate boundary values (0.0 never fails, 1.0 always fails) - Add delay action test using tokio::time::pause() for deterministic timing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(self-repair): wire stuck_threshold, store, and builder (#647) Wire the previously dead-code fields in DefaultSelfRepair: - stuck_threshold: detect_stuck_jobs() now filters by duration, only reporting jobs stuck longer than the configured threshold - with_store(): wired in agent_loop.rs from AgentDeps.store for tool failure tracking via Database trait - with_builder(): wired from register_builder_tool() return value through AppComponents and AgentDeps for automatic tool rebuilding - tools: passed alongside builder for hot-reload logging Remove all #[allow(dead_code)] annotations. Add regression tests for threshold-based filtering (both above and below threshold). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add missing `builder` field to AgentDeps in gateway workflow harness After rebase onto staging, AgentDeps gained a `builder` field for self-repair tool rebuilding. The gateway workflow test harness was missing this field, causing CI compilation failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: retrigger CI * fix: force CI refresh after path_routing_tests dedup * test: add E2E test for stuck job repair and tool rebuild cycle Tests the full self-repair flow requested in review: 1. Job transitions Pending -> InProgress -> Stuck 2. detect_stuck_jobs() finds it (zero threshold) 3. repair_stuck_job() recovers it back to InProgress 4. A broken tool is repaired via MockBuilder 5. Verify builder was invoked and repair succeeded Uses a MockBuilder (impl SoftwareBuilder) that returns successful BuildResult without requiring an LLM or filesystem. Uses libsql test database for the store (increment_repair_attempts, mark_tool_repaired). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(self-repair): measure stuck_duration from Stuck transition, not started_at - Use ctx.transitions to find the most recent Stuck transition timestamp instead of ctx.started_at (which reflects job start, not stuck time) - Fix StuckJob.last_activity to use stuck transition timestamp - Remove misleading "hot-reloaded into registry" log - Remove stray "// ci fix" comment in memory.rs - Add regression test: backdated started_at must not inflate stuck_duration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: re-trigger CI with latest changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add type annotation to Ok(()) in test to resolve E0282 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-03-19T04:41:32Z

Code review

Found 1 issue:

[HIGH:100] Missing builder field in AgentDeps construction in tests/e2e_telegram_message_routing.rs:183. The PR adds a required builder field to the AgentDeps struct (and updates all test fixtures), but this E2E test file's AgentDeps construction was not updated. This will cause a compilation failure.

ironclaw/tests/e2e_telegram_message_routing.rs

Lines 183 to 201 in 8b15f8b

    
           let deps = AgentDeps { 
        
               owner_id: components.config.owner_id.clone(), 
        
               store: components.db.clone(), 
        
               llm: components.llm.clone(), 
        
               cheap_llm: components.cheap_llm.clone(), 
        
               safety: components.safety.clone(), 
        
               tools: components.tools.clone(), 
        
               workspace: components.workspace.clone(), 
        
               extension_manager: components.extension_manager.clone(), 
        
               skill_registry: components.skill_registry.clone(), 
        
               skill_catalog: components.skill_catalog.clone(), 
        
               skills_config: components.config.skills.clone(), 
        
               hooks: components.hooks.clone(), 
        
               cost_guard: components.cost_guard.clone(), 
        
               sse_tx: None, 
        
               http_interceptor: None, 
        
               transcription: None, 
        
               document_extraction: None, 
        
           };

Lines 183-201: AgentDeps construction missing builder field. Other test files (test_rig.rs, gateway_workflow_harness.rs) were updated to include builder: None, but this one was missed.

claude · 2026-03-19T04:42:02Z

Additional findings

Found additional issues in telegram message splitting tests:

[MEDIUM:75] Test assertion will fail in test_split_message_word_boundary. The implementation intentionally drops whitespace at split boundaries (as documented in the code comment: "this drops leading/trailing spaces at split points"), but the test expects exact equality after rejoining: assert_eq!(rejoined, text). When splitting at line/paragraph boundaries or sentence ends, the intermediate newlines/spaces are lost, making this assertion incorrect.

ironclaw/channels-src/telegram/src/lib.rs

Lines 432 to 441 in 8b15f8b

    
                   // Trim whitespace at chunk boundaries for clean Telegram display. 
        
                   // Note: this drops leading/trailing spaces at split points, which is 
        
                   // acceptable for chat messages but means the concatenation of chunks 
        
                   // may not exactly equal the original text when split at spaces. 
        
                   chunks.push(remaining[..split_at].trim_end().to_string()); 
        
                   remaining = remaining[split_at..].trim_start(); 
        
               } 
        
               chunks 
        
           }

Lines 432-441: The comment states "Rejoined chunks must equal the original text exactly", but this contradicts the documented behavior at line 76-77 that whitespace is dropped at split boundaries.

claude · 2026-03-19T04:42:45Z

Performance & Production Issues

Found additional performance concerns:

[HIGH:HIGH] O(n) transition vector scan on every stuck job detection. In src/agent/self_repair.rs:123-128, the code calls .iter().rev().find() on the job's transitions vector for every stuck job. For jobs with hundreds of state transitions, this becomes expensive at each detection interval. Consider storing stuck_at: Option<DateTime> directly on JobContext to avoid repeated scans.

ironclaw/src/agent/self_repair.rs

Lines 123 to 128 in 8b15f8b

    
           for job_id in stuck_ids { 
        
               if let Ok(ctx) = self.context_manager.get_context(job_id).await 
        
                   && ctx.state == JobState::Stuck 
        
               { 
        
                   let stuck_duration = ctx 
        
                       .started_at

[MEDIUM:MEDIUM] Missing timeout on builder.build() call in repair task. In src/agent/self_repair.rs:261, the async builder.build() call has no explicit timeout. If the builder hangs, the entire repair task hangs, blocking the repair interval loop from continuing. Add tokio::time::timeout() wrapper.

ironclaw/src/agent/self_repair.rs

Line 261 in 8b15f8b

[MEDIUM:HIGH] Multiple redundant UTF-8 scans in split_message. In channels-src/telegram/src/lib.rs:394-399, the code calls char_indices().take(4096) on every window iteration. For a 1MB message, this repeats ~244 times with full re-scans. Consider caching or using skip() to avoid recomputation.

…tion (#1400) - Add `builder: None` to AgentDeps initializer in e2e_telegram_message_routing test (field added in #712 but test not updated) - Update go_to_extensions() in test_telegram_hot_activation to navigate via settings tab -> extensions subtab (extensions tab was moved to settings) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: navigate telegram E2E tests to channels subtab wasm_channel extensions (like telegram) are now rendered in the Settings → Channels subtab, not the Extensions subtab. Update test_telegram_hot_activation to navigate there and use the correct card selector. Also mock /api/gateway/status which loadChannelsStatus fetches. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: select telegram card by name, not first card in channels subtab Built-in channel cards (Web Gateway, HTTP, etc.) render first in the channels subtab content, so .first matches them instead of the telegram extension card. Select by has_text="Telegram" to target the correct card. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: make gateway_status_handler parameterizable in mock helper Address review feedback: extract default gateway status handler and accept an optional gateway_status_handler kwarg in mock_extension_lists for test flexibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…6242 chore: promote staging to staging-promote/b9e5acf6-23283208580 (2026-03-19 15:15 UTC)

…8580 chore: promote staging to staging-promote/3dcccc1e-23280048384 (2026-03-19 06:44 UTC)

zmanian and others added 3 commits March 18, 2026 20:37

ironclaw-ci bot added the staging-promotion label Mar 19, 2026

github-actions bot added scope: agent Agent core (agent loop, router, scheduler) scope: tool Tool infrastructure size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Mar 19, 2026

henrypark133 and others added 4 commits March 18, 2026 23:38

Merge pull request #1409 from nearai/staging-promote/07c6ca72-2330201…

0e3aa4f

…6242 chore: promote staging to staging-promote/b9e5acf6-23283208580 (2026-03-19 15:15 UTC)

Merge pull request #1402 from nearai/staging-promote/b9e5acf6-2328320…

656d1f3

…8580 chore: promote staging to staging-promote/3dcccc1e-23280048384 (2026-03-19 06:44 UTC)

henrypark133 merged commit e582166 into staging-promote/ec04354c-23271447493 Mar 19, 2026
13 checks passed

henrypark133 deleted the staging-promote/3dcccc1e-23280048384 branch March 19, 2026 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: promote staging to staging-promote/ec04354c-23271447493 (2026-03-19 04:37 UTC)#1396

chore: promote staging to staging-promote/ec04354c-23271447493 (2026-03-19 04:37 UTC)#1396
henrypark133 merged 7 commits intostaging-promote/ec04354c-23271447493from
staging-promote/3dcccc1e-23280048384

ironclaw-ci bot commented Mar 19, 2026 •

edited by github-actions bot

Loading

Uh oh!

claude bot commented Mar 19, 2026

Uh oh!

claude bot commented Mar 19, 2026

Uh oh!

claude bot commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ironclaw-ci bot commented Mar 19, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Auto-promotion from staging CI

Commits in this batch (22):

Current commits in this promotion (5)

Uh oh!

claude bot commented Mar 19, 2026

Code review

Uh oh!

claude bot commented Mar 19, 2026

Additional findings

Uh oh!

claude bot commented Mar 19, 2026

Performance & Production Issues

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ironclaw-ci bot commented Mar 19, 2026 •

edited by github-actions bot

Loading